Kusefo/. BioMedical Engineering OnLine 2013, 12:94 
http://www.biomedical-engineering-online.eom/content/1 2/1 /94 



bin 

E BioMedical Engineering 
OnLine 



RESEARCH 



Open Access 



Multivariate matching pursuit in optimal 
Gabor dictionaries: theory and software with 
interface for EEG/MEG via Svarog 



Rafal Kus, PiotrTadeusz Rozariski and Piotr Jerzy Durka* 



•Correspondence: 

durka@fuw.edu.pl 

Faculty of Physics, University of 

Warsaw, ul. Hoza 69, 00-681 

Warszawa, Poland 



Abstract 

Background: Matching pursuit algorithm (MP), especially with recent multivariate 
extensions, offers unique advantages in analysis of EEG and MEG. 

Methods: We propose a novel construction of an optimal Gabor dictionary, based 
upon the metrics introduced in this paper. We implement this construction in a freely 
available software for MP decomposition of multivariate time series, with a user friendly 
interface via the Svarog package (Signal Viewer, Analyzer and Recorder On GPL, http:// 
braintech.pl/svarog), and provide a hands-on introduction to its application to EEG. 
Finally, we describe numerical and mathematical optimizations used in this 
implementation. 

Results: Optimal Gabor dictionaries, based on the metric introduced in this paper, for 
the first time allowed for a priori assessment of maximum one-step error of the MP 
algorithm. Variants of multivariate MP, implemented in the accompanying software, are 
organized according to the mathematical properties of the algorithms, relevant in the 
light of EEG/MEG analysis. Some of these variants have been successfully applied to 
both multichannel and multitrial EEG and MEG in previous studies, improving 
preprocessing for EEG/MEG inverse solutions and parameterization of evoked potentials 
in single trials; we mention also ongoing work and possible novel applications. 

Conclusions: Mathematical results presented in this paper improve our 
understanding of the basics of the MP algorithm. Simple introduction of its properties 
and advantages, together with the accompanying stable and user-friendly Open 
Source software package, pave the way for a widespread and reproducible analysis of 
multivariate EEG and MEG time series and novel applications, while retaining a high 
degree of compatibility with the traditional, visual analysis of EEG. 

Keywords: Matching pursuit, EEG, MEG, Time-frequency, Gabor dictionary, Metrics 



Background 

Since the first application to EEG in 1995 [1], matching pursuit algorithm (MP) has been 
shown to significantly improve the EEG/MEG analysis in a variety of paradigms, including 
pharmaco-EEG [2,3]. assessment of propagation [4], dynamics [5] and signal complexity 
[6-8] in epileptic seizures, detection of seizures [9,10], analysis of somatosensory evoked 
potentials in humans [11] and rats [12], detection of sleep spindles in Obstructive Sleep 
Apnea [13] and investigation of their chirping properties [14], studies of high gamma in 
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humans [15] and monkeys [16], investigation of brain's pain processing [17,18], parama- 
terization of vibrotactile driving responses [19] and event-related desynchronization and 
synchronization [20,21]. 

New area of applications opened with the advent of multivariate MP (MMP) algorithms. 
MMP preprocessing was shown to significantly improve stability and sensitivity of EEG 
inverse solutions [22-27] and allowed for tracing evoked responses in single trials of EEG 
and MEG [28-31]. 

Finally, the algorithm offers also unique compatibility with the traditional, visual 
analysis of EEG. Specific mode of operation of MP, which is sequential focusing on 
locally strongest ("most visible") signal structures, resembles the working of an electroen- 
cephalographer who visually evaluates the EEG time series. Proper interpretation of the 
MP parameterization can provide a direct link to the results of visual analysis of EEG 
[32]— that is, we may find a direct correspondence between the waveforms fitted to the 
EEG time series, and the structures marked by visual scorer, including sleep spindles, 
slow waves or epileptic spikes [33-35]. This advantage should not be underestimated in 
the field, where most of our knowledge about behavioral and neurological correlates of 
EEG comes from visual analysis, which is still the only golden standard and point of refer- 
ence: according to the Report of the American Academy of Neurology and the American 
Clinical Neurophysiology Society [36], quantitative EEG analysis should be used only by 
physicians highly skilled in clinical EEG, and only as an adjunct to and in conjunction with 
traditional EEG interpretation. 

In spite of all these advantages, PubMed search of "matching pursuit and EEG" still 
returns only a two-digit number. This limited acceptance of such a promising method 
may be partly due to the lack of: 

1. Well defined criteria for setting the most important parameters of the algorithm, 
which are the number and distribution of dictionary's functions. 

2. Common framework for a variety of multivariate MP algorithms. 

3. Robust and user friendly software based upon solid mathematical foundations. 

Following sections address these issues from two different angles. Next section An 
interactive tour of matching pursuit provides plain English introduction of the major 
aspects and parameters of the algorithm, based on example computations. Following 
sections use equations to introduce optimal sampling of Gabor dictionaries and a com- 
mon framework for multivariate MP (MMP), and Appendix discusses major numerical 
and mathematical optimizations used in the MMP implementation accompanying this 
paper. 

An interactive tour of matching pursuit 

[Additional file 1] is a video tutorial for downloading and configuration of the Svarog 
package, used for computations and visualization of results (Figures 1, 2, 3, 4, 5 and 6 
are screenshots from Svarog). Screencast in [Additional file 2] shows the steps from 
loading the signal to displaying interactive map of time-frequency energy distribution. 
[Additional file 3] contains help of the MP module from Svarog. Complete software 
environment used for computations presented in this paper (including examples from 
the following sections) is freely available from http://braintech.pl/svarog, as described in 
section Software availability and requirements. 
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Figure 1 Sample epoch of sleep EEG. Screenshot displaying an epoch of sleep EEG recording. SWA and 
sleep spindles were marked by an electroencephalographer as green and gray rectangles, correspondingly 
(in this case both spindle tags fall inside the epochs marked as SWA). Blue dashed line outlines the epoch 
from C3 selected for MP decomposition, shown in Figure 2. 
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Figure 2 Time-frequency distribution of signal's energy. Results of MP decomposition displayed as an 
interactive time-frequency of map signal's energy density in Svarog. Clicking center of a blob (marked by 
white cross) adds the corresponding function to the reconstruction (bottom signal). 
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Figure 3 Defining a filter for SWA. Filter defining criteria for waveforms corresponding to SWA in the 
Svarog interface to MP. 



The algorithm 

The gist of the matching pursuit algorithm can be summarized as follows: 

1. We start by creating a huge, redundant set (called a dictionary) of candidate 

waveforms for representation of structures possibly occurring in the signal. For the 
time-frequency analysis of signals we use dictionaries composed of sines with 
Gaussian envelopes, called Gabor functions, which reasonably represent waxing 
and waning of oscillations. 
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Figure 4 Structures corresponding to SWA. Result of the application of the filter from Figure 3 to the 
decomposition from Figure 2. 
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Figure 5 Structures corresponding to sleep spindles. Result of application of the filter defining sleep 
spindles to the decomposition from Figure 2. 
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Figure 6 Setting the dictionary parameters for MP decomposition in Svarog. Tab for setting the 
parameters governing construction of the Gabor dictionary for MP decomposition in Svarog. 
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2. From this dictionary we choose only those functions, which fit the local signal 
structures. In such a way, the width of the analysis window is adjusted to the local 
properties of the signal. Local adaptivity of the procedure is somehow similar to the 
process of visual analysis, where an expert tends to separate local structure and 
assess their characteristics. Owing to this local adaptivity, MP is the only signal 
processing method returning explicit time span of detected structures. 

3. The above idea is implemented in an iterative procedure: in each step we find one 
"best" function, and then subtract it from the signal being decomposed in the 
following steps. 

MP advantages in EEG 

We will discuss some of the advantages of MP in EEG analysis using an example from the 
field of sleep research. As mentioned in section Background, visual analysis of EEG is still 
the golden standard; in the area of sleep the basic reference [37] comes from 1968 (with 
later updates [38]). It defines criteria for division of sleep into stages, based mostly upon 
presence/prevalence of certain structures in the corresponding epochs of EEG recording. 
As for the definitions of these structures, formulated for standardization of their visual 
detection, let us take the example of sleep spindles: 

The presence of sleep spindle should not be defined unless it is at least 0.5 sec. duration, 
i.e. one should be able to count 6 or 7 distinct waves within the half-second period. (...) 
The term should be used only to describe activity between 12 and 14 cps [37]. 

Over the years, common definition drifted towards frequencies from 11 to 15 Hz, dura- 
tion 0.5-2 seconds and amplitude usually above 15 fiV. Reading this definition after 45 
years, we are still surprised that: 

1. Criteria are defined almost explicitly in time-frequency terms. 

2. Before 1993 (that is before the introduction of the matching pursuit [39]), no signal 
processing method returning explicitly the frequency, amplitude and time width of 
the oscillations present in a signal was known. 

In the following we present MP decomposition of an EEG epoch containing sleep 
spindles and slow wave activity (SWA). Figure 1 presents a sample epoch of sleep 
EEG loaded into Svarog (Signal Viewer, Analyzer and Recorder on GPL, see section 
Software availability and requirements). Green and gray rectangles represent SWA and 
sleep spindles, marked visually by an encephalographer. 

Figure 2 presents results of MP decomposition of the epoch selected in Figure 1. Curves 
below the time-frequency map of energy density represent: 

1. Original signal chosen for decomposition (marked in Figure 1). 

2. Reconstruction (time course) of selected functions (marked by circled crosses). 

Central panel holds the time-frequency map of signals energy density. Each function 
fitted to the signal by MP is presented as a Gaussian blob in the appropriate time and 
frequency coordinates, with time and frequency widths corresponding to its parameters. 
Formula for computing this representation is given by Eq. (4). One can argue about the 
superior properties of this estimate; indeed, there is a lot of possible ways to estimate sig- 
nals energy density in the time-frequency plane (spectrogram, wavelets, Wigner-Ville etc.) 
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and none of them is perfect for all signals. Therefore, we shall concentrate on a unique 
feature of the MP decomposition, which is an explicit parameterization of the transients 
present in a signal. 

Each of the blobs in Figure 2 represents a Gabor function of known time position and 
width, amplitude, frequency and phase, as in Eq. (3). These parameters contain the pri- 
mary information about the signal's content (Eq. (2)), and they can be used directly for 
the investigation of the properties of the signal, as exemplified below. 

Let us define the occurrence of SWA as a wave of amplitude above 50 fiV, width above 
half a second and frequency from 0.5 to 4 Hz. Definition of such filter in Svarog is pre- 
sented in Figure 3. The amplitude is entered as 25 jiV, because the encephalographic 
convention relates to the peak-to-peak amplitude, which for the Gabor function is double 
of the mathematical amplitude. This convention is not the only difference between math- 
ematics and visual perception of structures in the signal. For example, visual perception 
of both amplitude and time width depends on the context (mostly the variance of the sig- 
nal). Therefore, if exact replication of the visual detection is the main goal, factors like 
S/N ratio can be incorporated into the post processing criteria. 

Result of the application of the filter from Figure 3 to the decomposition from Figure 2 is 
given in Figure 4. Automagically, all that is left are structures conforming to the definition 
of SWA. Two major advantages of this result over previously available methods should be 
noted: 

1. This selection takes into account all the features defining SWA, not only their 
frequency content as would be the case e.g. for a bandpass filter. 

2. We have separate parameterization of all the conforming structures, including e.g. 
the duration of each of them. 

The latter feature was explored in [40] for detection of sleep stages III and IV. These 
stages are defined as epochs occupied by SWA in 20-50% and over 50% of their 
time, correspondingly. As the previously employed signal processing methods did no 
parametrize explicitly the time span of relevant activity, it was the first explicit implemen- 
tation of this rule, designed for standardization of visual detection, in a fully automatic 
system. We observe also a good concordance of these results with visual detection, 
represented by the two green marks (intersected by the gray marks for spindles) repre- 
senting visually tagged SWA in Figure 1. This concordance was quantitatively evaluated 
in [34]. 

Similar operation can be performed for sleep spindles; application a filter reflect- 
ing their time-frequency parameters, quoted at the beginning of this section, gives us 
Figure 5. Again, structures detected by the algorithm fall within the borders marked by 
enecephalographer for spindles (gray tags in Figure 1). We observe also a typical example, 
when two superimposed spindles (on the left) were marked by the human expert as one. 
It exemplifies the fact that MP-based methods are in most cases downward compatible 
with visual detection, yet, apart from repeatability and automatization offer also increased 
sensitivity and resolution. Indeed, in the noisy signal it is almost impossible to see the 
boundary, and human brain did not evolve for an online calculation of instantaneous 
frequency, which differentiates these two spindles. 

Similarly to the detection of SWA, this scheme is not only more sensitive and selec- 
tive compared to previous approaches, while retaining a high degree of concordance with 
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visual detection [33,41], but also allows us, for example, to explicitly count the number of 
spindles occurrences in any given epoch — a parameter also relevant in sleep analysis. 

Finally, it is worth noting that the above proposed procedure is essentially free from any 
method-dependent settings, like the choice of the mother wavelet in wavelet transform, 
window width in spectrogram, or smoothing kernel in Wigner-Ville derived represen- 
tations. All these parameters can significantly alter the results of analysis, and optimal 
settings can be different for each analyzed epoch. On the contrary, matching pursuit 
decomposition as such is "more or less" uniquely defined by Equation (1). However, subtle 
relations between dictionaries and results of MP decomposition were not fully under- 
stood so far— their explanation constitutes the main mathematical result presented in this 
paper. So for the rest of this section let us concentrate on the "more or less". 

Parameters of MP decomposition 

As introduced in section MP algorithm, MP searches for functions fitting the signal in a 
large and redundant set called dictionary. The bigger (more redundant) this dictionary is, 
the better the chance of a perfect fit. Although it's a vague statement, nothing substan- 
tially more precise had been said on the relation of the dictionary size and quality of MP 
decomposition— until now. 

This study introduces a novel construction of the dictionary, in which the inner prod- 
ucts of the adjacent atoms are kept constant. In other words, distribution of dictionary's 
functions is uniform with respect to the metric related to their inner product. Using this 
construction gives us control over the maximum error of a single MP iteration, measured 
in terms of energy (product of the signal and a function from the optimal dictionary). 
Upper bound on this error is given by the worst case, where a structure present in the sig- 
nal falls "in the middle" between dictionary's functions available for decomposition. This 
error is not equivalent to the resolution of approximation, because MP is an iterative, 
nonlinear procedure. Formal introduction of the mentioned metric and construction of 
the optimal dictionary are given in section Optimal sampling of Gabor dictionaries. 

Figure 6 presents the "Dictionary" tab from the MP configuration window. The most 
important parameter (first from the top) is called "Energy error". It corresponds to e from 
Equation (7) in section Optimal sampling of Gabor dictionaries, and relates directly to the 
mentioned above maximum error of a single MP iteration. 

On the practical side, this parameter regulates the price/performance tradeoff in MP 
decomposition. Smaller (closer to zero) values result in larger amount of functions in the 
dictionary and higher resolution of resulting MP decomposition, at a price of increased 
computation times and memory requirements. After each change of this parameter the 
window displays the amount of RAM that will be occupied by the corresponding dictio- 
nary. Keeping "Energy error" constant for analysis of epochs of different sizes will result 
in larger dictionaries for longer epochs, but the accuracy of decomposition, related to the 
density of the dictionary, will be the same except for the border effects. 

Another issue related to the dictionary is a possible statistical bias, present in decom- 
positions of many epochs in the same, relatively small dictionary. This problem was dis- 
cussed in [42], where solution in terms of stochastic dictionaries was proposed. Stochastic 
dictionaries introduced in [42] were constructed by explicit random drawing of functions 
parameters from defined ranges (subsection Uniform sampling). This approach gave less 
control over the structure of the dictionary and excluded possibilities of several valuable 
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numerical optimizations. In this paper we propose a different approach, where random- 
ization is achieved by random removal of a defined fraction of functions from a dense, 
structured dictionary. This procedure is applied when user chooses "OCTAVE_STOCH" 
as "Dictionary type". In such case, a dictionary is first created according to the param- 
eter 'Energy error" (e from Equation (7)), and then selected fraction of functions is 
randomly removed from the dictionary. This fraction is governed by the parameter 
"Percentage chosen", so that (100 — Percentage chosen)% of functions is removed. 

Finally, the last of the major parameters sets the number of matching pursuit iterations, 
which is equivalent to the number of functions chosen for the representation (compare 
Equation (2)). Owing to the convergence property of MP, they are ordered by decreas- 
ing energy. Increasing the number of iterations does not influence the parameters of 
functions chosen in earlier steps. 

That is, if we make two separate decompositions of the same signal epoch and using the 
same dictionary— the first one with 50 iterations and the second one with 10 iterations- 
then the result of the first decomposition will be the same as taking the first 10 functions 
from the second decomposition (however, in case of stochastic dictionaries, these two 
decompositions will be performed using slightly different dictionaries). This difference is 
more pronounced for smaller dictionaries. 

There are several mathematical criteria for stopping the decomposition; their influence 
was evaluated e.g. for MP-based descriptors of signal complexity [6] or optimal video cod- 
ing [43] — general discussion of this parameter can be found in [44]. Software presented 
in this paper implements the two most basic options, based upon the maximum number 
of iterations ("Max iterations") and percentage of signal's energy explained by the whole 
decomposition ("Energy percent"). These options work in logical conjunction. Hence e.g. 
for the default settings (99% and 50 iterations), the procedure is completed after the 50th 
iteration— or before, if the representation (2) explains 99% of energy of the original signal 
for a lower number of functions M. 

Detailed information on the parameters of MP decomposition employed in mp5 is 
available from [Additional file 3], which contains the help of the mp5 module from Svarog. 

Optimal sampling of Gabor dictionaries 

We start by formally reintroducing the MP following the notation from the seminal paper 
by Mallat and Zhang [39]. Denoting the function fitted to the signal* in the «-th iteration 
of MP asg Yn , and «-th residual as R n x, we define the procedure as: 

R°x = x 

R n x=(R n x,g Yn )g Yn +R n+l x (1) 
g Yn = arg max &[€fl | {R n x,g Yl )\ 

As a result we get an expansion 

M-l 

x^J2 ( R " x ' Sy n )gy n (2) 

where M equals the number of iterations. For a time-frequency analysis of real-valued 
signals, dictionary D is usually composed from Gabor functions 

gy {t) = K{Y)e- 71 ^ 2 cos (w(t -u) + 4>) (3) 

where y = (u, co, s) and K(y) is such that \ \g Y \ \ = 1. 
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From expansion (2) we can derive a time-frequency distribution of the signal's energy 
(as shown in Figure 2), by summing Wigner distributions W of selected functions and 
explicitly omitting cross-terms: 

M 

Ex{t, co) = J2\ ( R " x ' Sm) I 2 Wg yn (t, co) (4) 

n=0 

This magnitude is presented in Figures 2, 4 and 5. 
Real world implementations of Gabor dictionaries 

Parameters (u, co, s) of functions g Y from Equation (3) form a 3-dimensional continuous 
subspace of K 3 — the infinite set-Doo. Ranges of parameters delimiting this subspace cor- 
respond to the usual assumptions that the time center u and time spread 5 do not exceed 
the length of the analyzed epoch, and to the fact that co makes no sense above the Nyquist 
frequency. The fourth parameter <p does not have to be taken into account in the con- 
struction of the dictionary D, since for any g Y we can find the phase cp maximizing its 
product (g y ,x) with given signal* in a single step— we recall this method in the Appendix 
(section Optimal phase of a Gabor function). 

From this D^, a finite dictionary D C must be chosen for any practical implemen- 
tation of MP. This choice is equivalent to the choice of a discrete and finite set y from 
the continuous ranges of u, co, and s. However, up to now the major questions, crucial for 
real-world implementations of MP, were unanswered: 

1. How should we choose the elements of the finite dictionary D from Doo and why? 

2. How much do we gain by increasing the size of the dictionary? 

It has been a well known fact, that using a larger dictionary of functions for decom- 
position of the same signal should lead to more accurate parametrization. However, 
no direct relation between the density of the dictionary and resolution of the resulting 
decomposition was derived so far. While an exact measure of the resolution of the highly 
nonlinear matching pursuit remains an open question, in the following sections we relate 
the maximum error of a single MP iteration to the density of an optimally sampled Gabor 
dictionary, filled with Gabor functions distributed uniformly with respect to the metric 
related to their inner product. 

Dyadic sampling 

The first, wavelet-like subsampling of Doo was proposed in the seminal paper [39]: 

Y = (u,co,s) = (pa'Au, k^-, al\ (5) 
V a) ) 

where k, p € Z and the basic intervals for time and frequency (Aw and Aco) are chosen 
so that Au = ^ < 1. If a dictionary D is constructed from the functions g Y with param- 
eters y = (u,co,s) chosen according to (5), then for any signal* € L 2 (R) there exists an 
optimality factor a > 0 such that 

sup \ {x,g Y ) \ > a sup \{x,g)\ (6) 

gy SO g^Dao 

However, there were no quantitative estimates for the performance of the MP in given 
dictionary, constructed for given Aco and a, and it seems that nothing can be said about 
the optimality factor a except for that it exists and is greater than zero. 
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Uniform sampling 

The quest for a justification of a particular scheme of subsampling Doo leads us to the 
considerations of a uniform sampling. If we know that the parameters (u, co, s) are uni- 
formly sampled with steps Au, Aco, and As, then for any g e there exists g Y e D, 
which has the parameters differing at most by halves of the sampling steps. Such sam- 
pling provides an estimate of one-step resolution of the highly nonlinear MP algorithm in 
the space of the parameters of Gabor functions. It was implemented in the mp4 software 
package [45] as a byproduct of the first approach to the issue of stochastic dictionaries 
[42], where parameters of the dictionary's functions were drawn from uniform probability 
distributions. 

However, such a sampling scheme in some cases is far from optimal for the MP algo- 
rithm. For example, constant sampling of the time interval Au for all the widths 5 will 
result in a dictionary containing a lot of strongly overlapping Gabors with large widths s 
(which will give large inner products with neighbors from the dictionary), and much more 
sparse coverage of positions of "shorter" Gabors with small s. Choice (1) of functions for 
the MP expansion (2) is based entirely on inner products, and the Cartesian metric in the 
space of the parameters of these functions is far from optimal in this respect. This leads 
us to the search for a metric that would correspond to the MP selection procedure (1). 

In pursuit of a relevant metric 

A metric that would reflect the distance between the dictionary elements as "seen" by 
the MP algorithm should be based on the inner product of dictionary's elements. We 
propose a metric d(go,gi) to ensure that the distance between nearest Gabor atoms in the 
dictionary does not exceed some a priori given threshold, e. Similarly to dyadic dictionary 
(see Eq. 5), we assume that the scale parameter 5 varies by a factor a (dilation factor), as 
follows: Sj = a', where j > 0 and a > 1 . However, we will propose a new method for 
determination of step values Aco and Au. We will show that this parametrization allows 
us to derive such step values, that for every (5, co, u) 

d(g(s,m,u,<p),g(as,a),u,<t>)) < 

d(g( s ,u),u,<t>),g(s,u)+\m,u,4>)) < e (7) 
d(g( s ,(o,u,<t>)'g(s,a>,u+CM,<i>)) < e • 

In order to construct a proper metric d(go, gi), we start by introducing a simple, intuitive 
measure for Gabor function similarity which satisfies all the properties of a metric: 

d 0 (go,gi) = 2"J \\g 0 = 2~ 1 2^(g 0 -gi\g 0 -gi) = yjl- (go[gi)- ( 8 ) 

Norm do above is naturally defined based upon the inner product in the space of the real 
signals. The constant 2~ 2 has been introduced to set the distance between orthogonal 
Gabor atoms to 1, while greater distances (up to 2) can exist between Gabor atoms whose 
inner products are negative. 

Inner product of two Gabor functions has the dimension of energy, thus the dimension 
of the metric (8) is amplitude. According to this fact, the conditions (7) imply the dimen- 
sion of the square of the parameter e as the quantity of energy. Also, (Go|Gi) max will be 
nonnegative for any two Gabor functions. Therefore, since Gabors are normalized, the 
value of their inner product is limited to the range (0, 1) and thus the limit of the e is 
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(0, 1). However, e = 1 allows the neighboring atoms to be orthogonal, so for a reasonable 
MP decomposition one should require e <§C 1. 

To the best of our knowledge, the above measure has not been previously applied to the 
Gabor dictionary construction problem, although a similar one for packing in projection 
spaces was introduced by Tropp [46]. Interpretation and possible values of parameter 
£ will be discussed in the following section, after we construct a final metric based on 
do(go,gi)- 

Phase-related equivalence 

According to the section Optimal phase of a Gabor function from the Appendix, one 
can easily calculate phase (f> of the Gabor function that maximizes its inner product with 
a given signal. Therefore, each atom in MP dictionary can be replaced with any other 
representative from the set G( S)ft))M ), which is an equivalence class in respect to a relation ~ 
defined as 

g(s 0 ,co a ,uo4o) ~ S(si,an,ui4>l) ( 5 0 = Si) A (&>o = W\) A (u 0 = U\) . (9) 

This feature has to be taken into account by introducing a correct metric for Gabor 
atoms 

d(go,gi)=d(g 0 /~,giM, (10) 
where d(Go, Gi) is the distance function defined on equivalence classes 



d(G 0 ,Gi) = min d 0 (g 0 ,gi) = jl - max(g 0 |gi) = -\A ~ (Go|Gi) max , ( n ) 
goeGo gosGo 

and we introduce the maximal scalar product (Go|Gi) max , which can be calculated as 

(G(s 0 ,w 0 ,u 0 ) |G : (s 1 , (Ul ,H 1 ))}max = P^J, ,(ff(«O>»O»«O,0o)l^(si»<wi»«l>0l)) • (1^) 

</>o,0i e[0;2jr] 

Inner products of Gabor functions 

The measure do (8) is based on an inner product of Gabor functions, so it is necessary 
to calculate analytical formulae for this product. Such expressions can be found for real 
Gabor functions 



,2 



g Y (t) =K v e- 71 ^ cos(co(t - u) + <t>) , (13) 
where y = (s, co, u, 0) is a set of the parameters, and 



2 3/4 

Ky = ^s~ 



1 + cos(20)e ^ 



i 



(14) 



is the normalization factor. 

To present an analytical formula for the inner product of two real Gabor functions 
(13) go and g\ with normalization factors Ko and K\ respectively, we introduce following 
constants: 

A = it I 4 + 4 ] (15) 







[r 


i) 




Ul 


u 


f 7 




s l 



5=T (16) 
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C=-n(4 + 4) + ABi = -n i ^f. (17) 

\4 4 4 + 4 



The complete formula for the product has been adapted from [47]: 

(go^i) =K 0 K 1 J —e^ I cos[(ft) 0 + coi)B + (0o + <Pi) - (<»owo + &>iwi)]e <w + 



+ cos[(c«o - + (0o - 0i) - (<wo«o - (oim)] e f 

(18) 

Products of adjacent atoms 

To simplify the inner product (18) for special cases of adjacent dictionary atoms, we will 
calculate distances between two atoms which differ in only one of the three parameters 
s, to, u. Let us discuss each case separately. 

Scale 

Because uq = u\, the scalar product is invariant to a shift in position u, so we can safely 
choose u = 0. In this case A = jr( ^ 2 ^' 1 - ) , B = 0, C = 0. Therefore, 



iS(s,(0,u=0,(f>o) \S(as,o},u=0,<t>i)} — 



la cos(0o - 0i) + cos(0 o + 0i)e (a +l)7t 



a 2 + 1 I / 

1 + cos(20n)e"T5rV 1 + cos(20i)e sr~ 



(19) 



The global maximum of this product will be present for 0o = <p\ = 0 or 0n = 0i = |-, 
depending on parameter values. Therefore, one can calculate 



( G(s,(0,uj G(as,a>,u) ) max = y' q2 ~ ^ max 



a 2 s 2 (o 2 a 2 s 2 (o 2 

l — e (« 2 +i)ir 1+e (" 2 + r >" 



5 2 6> 2 / a 2 s 2 oj 2 / S 2 fa) 2 / fl^ 

l — e ^yi-e 2^ yl+e 2*Yl+ e 



3i 

(20) 



Frequency 

The frequency parameter w changes by A&), so = si and «o = Mi. We can choose u = 0, 
analogically to the previous case. Here A = , -B = 0 and C = 0, so the product is as 
follows: 

_ S 2 6J(6>+ A6>) 

Aoj 2 s 2 cos(0 o - 01 ) + cos(0 o + 0i)e 211 
(g(s, m ,K=O,0 o )lg(s,(O+A£ti, M =O,0i)) = e 81 



/ s 2 a> 2 / s 2 (m+Am) 2 

V 1 + cos(20o)e 2 " y 1 + cos(20i)e 2" 

(21) 

The global maximum of this product will be present for 0 O = 01 = f • Therefore, 



Ao> 2 s 2 1 — e 2ir 

(G( Sj(U)M ) I G( S)ft)+ Ao>,M))max = e 8lr ; — ; ( 22 ) 

/ s 2 m 2 / s 2 ( m+Am )2 

y 1 — e 2^ Y 1 — e 2jt 
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Position 

In this case, so = s\, coq = &>i, and position parameter u differs by Au. We can choose 
u 0 = -^ and Ml = ^.Applying A = ^-,B = 0,C = -^- lead to 



-s^p cos(0o - 4>i + coAu) + cos(^o +<Pi)e 2 * 

V 1 + cos(2(/>o)e 2» yl + cos(2(/>i)e 2* 

(23) 

The global maximum of the product will appear for cj>o = —(pi. Therefore, 

_shl 

»y cos(20 + coAu) + e 25 
(G( s ,a>,u)\G(s,co,u+Au))max = e * 2 max . (24) 

0€[O;2.] l + cos(2(/) ) e - S TF- 

Construction of the optimally sampled dictionary 

Formulae (20), (22) and (24) substituted into (7) could be used to construct an optimal 
dictionary. However, to improve computational performance of MP algorithms, one can 
use Fast Fourier Transform (FFT) as described in the Appendix. Therefore, it would be 
preferable to have values of Aa> and Au independent of co, yet still satisfying (7). 

To construct such "uniform" dictionary, step values {a, Aw, Au) must be selected for 
such co that minimizes the value of the maximal scalar product (12). Exemplary step values 
obtained from (20), (22) and (24) are shown in Figure 7, as a function of frequency co, 
for three different values of parameter e. It can be observed that in order to obtain a, 
Aco, Au fulfilling the condition (7) and independent of co, one has to select step values 
corresponding to a large value of frequency co. 

Moreover, at frequency co R» jt, one can safely substitute in Equations (20), (22) and (24): 

e -T5T 0 

e s % 0 

_ S 2 ftj(6>+Aftj) 2 

e 2" s» 0 . 

With these approximations, maximal scalar products can be calculated easily as 



,oj,u) I G( aSt( ,),u) ) max ' 
(G(s,to,u)\G( s ,a}+A(i),u))m!ix ^ £ 



2a 



a 2 + l 

(26) 



(G(s,a),u)\G( s ,a),u+&u))max K £ 2s2 i 

taking into account that max0 e [o ; 27r] cos(2(/> + coAu) = 1. 

Therefore, the optimal step values independent of co can be calculated from Equations 
(7) in the following steps: 

1. For a given threshold e (see Eq. 7) one can calculate the dilation factor a: 

1 + eV(2-e- 2 )(e- 4 -2 e 2 + 2) 

a = T5 (27) 

(1 - eif 
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Figure 7 Optimal scale factor, frequency step and position step. Optimal scale factor a (top plot — see 
Eq. (20)), frequency step Aai (middle plot — see Eq. (22)) and position step Au (see Eq. (24)) as a function of cu 
for three values of e and scale s = 10. 



The set of scales Si, 52, . . . Sn s of Gabors are determined as: 

Sj = a' , (28) 

where / e Z+, N s = |_l°g a is the number of different scales and N is the length 
of the analyzed signal. 

For each scale s obtained from Eq. (28) the step Aft) in frequency domain and Au 
in time domain is calculated according to the formula: 
1 



Aw = -J-Stt log(l - e 2 ) (29) 
s 



Ait = sJ-^log(l-e 2 ). (30) 
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Multivariate matching pursuit (MMP) 

Need to analyze jointly more than one epoch usually arises in case of simultaneous mea- 
surements of more than one signal, or when subsequent epochs are treated as realizations 
of the same process (usually via some kind of ergodicity assumption). In EEG/MEG 
analysis these two cases usually refer to: 

1. Simultaneous recording of EEG/MEG from more than one electrode/sensor, called 
hereafter "channels". 

2. EEG/MEG recording of subsequent time-locked responses to repetitive stimuli 
(event-related potentials or fields, ERPs/ERFs), called hereafter "trials". 

Several versions of the MMP algorithm were separately proposed for particular 
applications — we will briefly discuss this issue in section Discussion. In this paper we 
propose a general framework, where variants of the MMP are identified on the basis of 
mathematical formulation of the two crucial elements of the multivariate algorithm: 

A. Inter-channel/inter-trial constrains, that is parameters that we keep constant or 
allow to change across the channels/trials. 

B. Criteria for selection of the function from the dictionary in each MMP iteration— in 
MMP, contrary to MP, this function is fitted to many epochs at the same time, and 
optimality of fit to many signals at once can be expressed in different ways. 

Such an universal approach allows for a direct implementation of the same code to a 
multitude of different paradigms encountered in recordings of any multivariate time 
series, not only EEG/MEG. Nevertheless, through this paper we will stick to "channels" 
and "epochs" in relation to the organization of the multivariate datasets. The following 
chapters define basic variants of MMP. 

MMP1 (max sum of the moduli of products, constant phase) 

The most straightforward multivariate extension of the MP— let's call it MMP1— can be 
achieved as follows: 

Al. Only the amplitude varies across channels. 

Bl. We maximize the sum of products in all the channels. 

We maximize the sum of moduli of the products rather than their squares, as proposed in 
[48], due to the more efficient selection of the common phase for all the channels. Also, 
owing to the Gabor update formula (38), in all the iterations but the first one we compute 
the products of dictionary's atoms with the function from the previous iteration, which 
has all the parameters fixed except for the amplitude. This saves a lot of computations 
comparing to the case of computing products with all the channels residua separately, as 



Let us denote the multichannel signal as x, and the signal in the ith channel as x', with 
i = 1 . . . N c , where N c is the number of channels. We may express the condition (Bl) for 
the choice of atom g Y in «-th iteration as 



in MMP3. 




(31) 
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The whole procedure can be described as 
R°x = x 

R"x l = {R n x l ,g Yn )g Yn + R " +1 ^ ( 32 ) 
g Y „ = argmax^ e£) ^i \(R n ^,g Y )\ 

Results of MMP1 are given in terms of functions g Yn , selected in consecutive iterations, 
and their weights in all the channels, determined for channel i by the real-valued products 
{R"x l ,g Yn ). In each iteration, the multichannel residuum R" +1 x is computed by subtract- 
ing from the previous residua in each channel i the contribution of g Yn , weighted by 

MMP2 (max modulus of the sum of products, constant phase) 

Assumption of invariant phase in all the channels was explored in [22] to yield an effi- 
cient decomposition algorithm. If we modify the criterion of choice from the previous 
section to 



max 

gy£D 



N c 



(33) 



we get the conditions: 
A2. Only the amplitude varies across the channels. 

B2. We maximize the absolute value of the sum of products across channels. 

Due to the linearity of the residuum operator 7? [22] , this choice allows for implementing 
a simple trick. Instead of finding in each step the product of each dictionary's waveform 
with all the channels separately, and then computing their sum (33), in each step we 
decompose the average signal x 

1 Nc 

x= — y"x ; (34) 



R°x = x 

R"x={R"x,g Y „)g Y „+R" +1 x (35) 
g Y „ = arg max gy eD \(R"x,g Y )\ 

This procedure yields a computational complexity close to the monochannel MP — 
compared to MMP1, reduced by the factor N c (that is, number of channels). 

Convergence of this procedure may be relatively slower for waveforms appearing in 
different channels with exactly opposite phases. 

Due to operating on the average of channels, this version of the algorithm cannot be 
directly applied to the data presented in the average reference (montage). These problems 
are absent in MMP1 as well as in the next implementation, allowing for arbitrary phases 
across the channels. 



MMP3 (variable phase) 

A3. Phase and amplitude vary across the channels. 

B3. We maximize the sum of squared products (energies) across channels. 
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Again, as in (31), we maximize 



N c 

maxV|(R"x',4)| 2 (36) 

but this time £ are not the same g Yn for all channels i— they can have different phases. 
R°x = x 

R n x l = {R n yi,i v X Yn + R n+1 x l (37) 
^„=argmax gyefl E^il(«4>| 2 

As presented in the Appendix (section Optimal phase of a Gabor function), comput- 
ing an optimal phase of Gabor function gy, maximizing absolute value of the product 
(R n x l ,g y ), can be implemented very efficiently. Value of (31) for phases optimized sepa- 
rately will never be smaller than in the case of the phase common to all the channels, so 
this freedom should improve the convergence. 

MMPXY 

In case of multichannel recordings of event-related potentials (ERPs), we are dealing 
with a slightly more complicated structure of the data. Instead of a vector of signals x l , 
where i indexes channels/sensors, we get a matrix of signals X' k , with additional index k 
reflecting subsequent repetitions (single trials). On such data, MMP can be performed 
across either of the two dimensions i and k. For the sake of simplicity, we will call these 
indices "channels" and "trials", although for the algorithm they can represent any direc- 
tion of a multidimensional dataset. The algorithm does not contain any problem-specific 
optimizations and as such preserves generality. 

Usually, the first index in multiplexed multichannel data is the channel (sensor) i and 
the second is the time index t. Number of trial k comes next, and indexes the the set of 
whole epochs and all channels i, related to the £~-th repetitions of an event (usually ERP). 
MP operates on epochs indexed by discrete time t. MMP will operate on channel and trial 
indices i and k, allowing for different constrains across sensors or repetitions. 

For the mp5 package we assumed the naming convention in the form MMPXY, where 
X denotes the version of MMP algorithm used in each iteration across the channel 
dimension, and Y— across the repetitions: 

MMP11: For each channel and trial fit the optimally with (p = const. 
MMP12: Average trials k in each channel i and fit optimally with <j> = const to these 
averages. 

MMP21: For each trial k find the average across channels i and fit optimal^ with 

<p = const to these averages. 
MMP23: For each trial k find the average across channels i and fit to these averages 

optimal^ with potentially different phase (j) in each channel. 
MMP32: For each channel i find the average across trials k and fit to these averages 

optimal^ with potentially different phase <p in each trial. 
MMP33: For each channel and trial fit the optimal g y with potentially different phase (j> 

in each channel and trial. 



For example, MMP12 denotes a setting where in each iteration the choice of the atom 
fitting best all the channels will be effectuated according to MMP1, while across the trials 
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MMP2 will be used. That is, in each iteration the residua of single trials are averaged 
separately in each channel, and to these averages the bestgy is fitted according to MMP1. 
Naming of the algorithms (MMPX) corresponds to the three above subsections. MMP13 
and MMP31 were not implemented in mp5. 

Discussion 

Resolution of MP 

One of the problems, faced by everyone applying MP to exploratory analysis of signals, is 
"how big should be the dictionary so that I do not miss some important structures?" Of 
course "the bigger the better", but increasing the size of the dictionary increases signifi- 
cantly the computational cost, and the exact gain in resolution of MP representation was 
not known so far. Optimal Gabor dictionaries, introduced in this paper, for the first time 
allow to relate the density of the dictionary to the maximum error of a single iteration of 
the algorithm. 

Details of implementation 

This paper describes mathematical foundations and numerical optimizations, imple- 
mented in the freely available software package mp5, developed at the University of 
Warsaw for decomposition of signals using multivariate matching pursuit. This software 
made possible several published works [24,28,29] plus some studies in progress. It builds 
on another decade of experience in using the previous, monovariate version of the algo- 
rithm which we made freely available as mp4 at http://eeg.pl/mp, and used in over a 
dozen published studies. Although all these cited works would not be possible without 
this software, they were published as "classical scientific papers" where author is supposed 
to concentrate only on the scientific novelty, not tools. There was no space for all the 
important details of the implementations, which make a big difference in the results and 
hence constitute the core of the Reproducible Research. Therefore, although the source 
code of both these packages have been freely available for years, a complete descrip- 
tion of employed mathematical and numerical optimizations and tricks, as presented in 
Appendix Implementation and optimizations, was not published up to now. 

Applications of MMP to EEG/MEG 

Probably the most promising field of future applications is related to the multivari- 
ate matching pursuit (MMP). Real world applications were made possible owing to 
the progress in computer hardware in recent decades. In section Multivariate matching 
pursuit (MMP) we propose, in concordance with [44], a simple classification of MMP 
variants according to the combinations of the constraints on parameters and criteria of 
choice. However, there is a potential continuum of other approaches outside of this sim- 
ple scheme, tailored very specifically for particular applications. Most of them suffer from 
the need of arbitrary setting some parameter that may significantly change the results of 
decomposition. In this light we believe that in most cases the simple approach proposed 
in section Multivariate matching pursuit (MMP), which is free from arbitrary parameters, 
is optimal and most elegant, at least for starting. A procedure that is free of task-specific 
settings has also obvious advantages stemming directly from its generality. 

However, specific approaches of course may offer addition advantages in particu- 
lar cases. For example, MMP tailored for the analysis of stereo recordings of sound 
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in [49] allows for different time positions of the time-frequency atoms present in the two 
channels. Together with different amplitudes in each channel, it relates to modeling the 
microphones as gain-delay filters in the anechoic case. Unfortunately, a model explaining 
relations between channels of EEG/MEG recordings is far more complicated, even in the 
case of known distribution of sources (so-called forward EEG problem). 

An attempt to incorporate constraints, reflecting the generation of multichannel EEG, 
into the MMP procedure, was presented in [26]. To the purely energetic criterion of 
MMP1 (31), a second term was added to favor those g y which give smooth distribution 
of amplitudes across the channels. Spatial smoothness (quantified by Laplacian opera- 
tors) means basically that the values of (R"x l ,g y ) should be similar for i corresponding to 
the neighboring channels. However, a choice combining two completely different crite- 
ria requires some setting of their relative weights. For example, if we attribute too much 
importance to the spatial criterion, in favor of the energetic one, we may obtain atoms giv- 
ing very smooth scalp distributions across electrodes. But in such a case the convergence 
of the MMP procedure, measured in the rate-distortion sense, relating to the amount of 
explained energy, may be severely impaired. Up to now, no objective or optimal settings 
for regulating the influence of such extra criteria on the MMP algorithms was proposed. 

Finally, we should also point out some of the possible novel and promptly imple- 
mentable applications of MMP. Variants of the multivariate algorithms, described in 
section Multivariate matching pursuit (MMP), can be related to several models of multi- 
channel recordings of repetitive trials of evoked brains activity. Algorithms developed for 
parameterization of EEG structures in subsequent channels [22,24] have been be applied 
to decomposing subsequent trials of event-related potentials [28,29]. 

Ongoing works include application of MMP3 to evoked potentials, where variable phase 
accounts for the variable latencies, and instantaneous decomposition of both repetitions 
and channels of event-related potentials, with some of the discussed constraints applied 
separately to the relevant dimensions. 

Apart from that, MMP3 may be also used to compute estimates of the phase lock- 
ing factor [50] (also called inter-trial coherence, [51]). Simultaneous decomposition of all 
the repetitions will be crucial in this case: in separate MP decompositions of subsequent 
trials, atoms representing possibly the same structures can have slightly different frequen- 
cies, which makes their relative phase insignificant. By estimating the phase coherence 
only in those are of the time-frequency plane, where indeed an oscillatory activity appears, 
we may get rid of a lot of noise blurring previously applied estimates. 

Software availability and requirements 

Software package described in this article is freely available from http://braintech.pl/ 
svarog/. It can be run on a computer with a reasonably recent version of MS Windows, 
Mac OS or GNU/Linux with Java runtime environment. Screencasts (video files), show- 
ing (1) downloading and configuration of the package and (2) MP decomposition process 
of a sample EEG epoch via Svarog, are included as [Additional file 1] and [Additional 
file 2]. These videos can be also viewed in a variety of formats embedded in HTML5 at 
http://braintech.pl/svarog. [Additional file 3] contains a snapshot of Svarog's help related 
to mp5. 

Complete source code for the MMP engine written in C is available from http://git. 
braintech.pl/matching-pursuit.git. GUI is implemented in Java within the Svarog system, 
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for which the source code is available at http://git.braintech.pl/svarog.git. Both projects 
are released on terms of the General Public License (GPL). 

Svarog is a loose acronym for Signal Viewer, Analyzer and Recorder on GPL, and con- 
stitutes the core of the world's first professional EEG recording and analysis system based 
entirely on GPL software (http://braintech.pl/Manifesto.html). This multiplatform sys- 
tem is developped primarily for GNU/Linux. Current versions of the system, including 
the above discussed software plus the OpenBCI framework for brain-computer interfaces 
and a modified version of the PsychoPy framework for experiments design, are available 
as packages for Ubuntu and Debian (see http://deb.braintech.pl). 

Appendix 

Implementation and optimizations 
Product update formula 

This optimization was proposed in [39]. 
Let us recall from (1) the formula for the wth residuum: 

R"x={R"x,g r „)g rn +R n+1 x 

Taking the product of both sides withg yi , which is the candidate for selection in the next 
iteration, we get 

(R n+1 x,g Yi ) = (R n x,g n ) - (R n x,g Yn )(g yn ,g Yi ) (38) 

This equation expresses the product of a dictionary function g Yi with the residuum in 
step n + 1 using two products, which were already calculated in the previous iteration — 
(R"x,g Yi ) and (R"x,g Yn )— and a product of two functions from the dictionary— {g Yn ,g Yi }- 
Therefore, the only thing that remains to be computed is a product of two known 
functions. 

Inner product of continuous Gabor functions can be expressed in terms of elemen- 
tary functions (see [47,52]). Unfortunately, it does not reflect with enough accuracy the 
numerical value of the product of two discrete vectors, representing sampled versions 
of the same Gabor functions. Exact formula for the product of the latter involves theta 
functions, which can be approximated by relatively fast converging series [47]. 

Sin, Cos, and Exp: fast calculations and tables 

In spite of the trick from the previous section, still — at least in the first iteration — we need 
to compute the "plain" inner products of the signal with Gabor functions. Using the result 
from section Optimal phase of a Gabor function, of all the phases (p we calculate products 
only for </> = 0 and </>=§■ 

Computationally, the most expensive part is not the actual calculation of the inner 
products, but the generation of discrete vectors of samples from Equation (3), which 
contains cosine and exponent. Compilers usually approximate these functions by high- 
order polynomials. Although contemporary CPUs may implement directly some special 
functions, they will still be much more expensive to compute than basic additions or mul- 
tiplications. Therefore, avoiding explicit calls of these functions may result in significant 
acceleration — together with tabularization, it accelerated the MP implementation [45] by 
over an order of magnitude. In the following we show (after [53]) how to fill in a vec- 
tor with values of sines and cosines for equally spaced arguments using only one call of 
these functions. 
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Since the time t in (3) in the actual computations is discrete, the trick is to com- 
pute sm{co{t + 1)) knowing sin(cot). Using the trigonometric identity (59) with its 
corresponding form for the sine function, we get 

sm{co{t + 1)) = s'm(cot + to) = cos(cot) sin(&>) + s'm(cot) cos(cd) (39) 
cos(co(t + 1)) = cos(cot + to) = cos(cot) cos(a>) — sm(cot) sin(&>) (40) 

We start with t = 0, setting cos(O) = 1 and sin(O) = 0, and computing constants 
cos(cw) and sin (to). Values of (39) and (40) for subsequent t can be filled in a recursive way, 
using the computed cos(&>) and sin(eo) and taking as sm{cot) and cos(cot) values from the 
previous steps. 

A similar approach can accelerate computation of the factors e~ at2 present in (3): 

e -«(*+D 2 = e -af-2at-a = - a ? -l*t -a (41) 

To compute (41) we need e~ at2 from the previous iteration, constant e~ a independent of 
t, and e~ 2at . The last factor can be updated in each iteration at a cost of one multiplication: 
to get e -2a ( t+1 ) from e ~ 2at we multiply it by a precomputed constant e~ 2a . 

In all these cases we also take into account the symmetries sin(— x) = — sin{x), 
cos(— x) = cos{x), and e~ ( ~ x ^ = e~* 2 to double the savings. Values of these vectors can 
be stored in memory for subsequent calculations of Gabor vectors (3) with different com- 
binations of sin/cos and exp, but only if we restrict the discretization of parameters to 
some integer grid, for example: 

u = l...N, co = (l...N)jt/N, s = l...N (42) 

Apart from fast Sin, Cos and Exp function generation, the optimal dictionary allows for 
saving in the computer memory the tables with values of these functions. The number N s 
of different scales in an optimal dictionary is (see Equation (27)): 

N s = |_log a ATJ (43) 

where N is the number of samples in signal and a is the parameter expressed by Equation 
(27). A typical epoch of EEG/MEG contains some thousands samples, so it is possible 
to store all EXP functions in computer memory. Due to the fact, that Gabors, for given 
scale, are arranged in frequency domain in increments of Aco (29) one can save also in 
computer's memory one period of Sine/Cosine signal of the lowest frequency. The sine 
and cosine signal of higher frequencies, for example k x Aco, where k is the natural num- 
ber, can be generated by means of selecting every /c-th sample from Cos/Sine signal of 
frequency Aco stored in the computer's memory. 

Fast detection of two orthogonal Gabors 

Update formula (section Product update formula) in combination with optimal dictio- 
nary allows for uses the next numerical optimization— fast assessment of orthogonality of 
two Gabors functions. Let us analyze the analytical formula for an inner product of two 
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Gabors. After substituting to the Equation (18) the exact expression for normalization 
factor K y and constant C, and introducing the following factors: 

( (fflQ+a>l) 2 
cos [(coo + a>i)B + (0o + 0i) - (&>oWn + &>iwi)] e H 

+ cos [(w 0 - «i)-B + (</>o - 0i) - («oMo - viui)] e M 

2 6 / 4 

Y = KoKiJ^i = = (44) 



[7 «w 

/ ( 1 + cos(20o)e _ Tir ) ( 1 + cos(20!)e-T^ J 



/ 71 , x-i C / 5 ° 5 1 _7r s 2 +s 2 

Z = V^ (5152) 2e = \«&^A) e 01 

one can obtain: 

(gb.ft)=Jf-^--Z (45) 

It is straightforward to estimate that value of factor X fulfils the condition 

\X\ < 2 (46) 

for every possible set of parameters {so,si,d)o,wi,(po,(pi,uo,ui}. The maximal value of 
parameter Y is limited by the lowest values of scale s and frequency w, which, based on 
(29), are: 

^ = Smin > 0 

(47) 



= Aco= ^y_log(l- c 2) 



Smin 

and phases 0 O = </>i = ±f ■ Substituting above formulae into factor 7 results in following 
expression: 

in < 23/2 < 23/2 « ^ 

— ' — 1(1 1\ ^ 2 

(l + cos(20 o ) (1 - e 2 ) 2 ) (l + cos(20i) (l - e 2 ) 2 ) 

(48) 

Based on this observation, it is possible to introduce a numerical threshold, defining an 
approximate orthogonality of two Gabor functions. The factor Z depends on the relative 
position and the width of two Gabors. Atoms which differ mostly in these parameters will 
give a small factor Z. Therefore 

2 /2 

-J-Z < n =► \XYZ\ < n =► ( M ) » 0 (49) 

where jy can be set for example at the accuracy of a double precision number (10~ 16 ). 
Condition (49) allows for efficient detection of orthogonal atoms in dictionary and replac- 
ing their inner product by zero in Equation (38). Moreover, in case of a dictionary with 
uniform step Aco at a given scale, it is possible to determine the set of Gabor functionss, 
characterized by the same position u, for which inner product with the Gabor selected in 
previous iteration will be zero. 
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Limiting domain of the product 

When Equation (49) is not fulfilled, the two Gabor functions cannot be treated as orthog- 
onal, and their product has to be determined. Full scalar product of two Gabor functions 
g\ and g2 can be written as 

/+oo 2 
e -A(t-B) e C CQS ^ t _ Ul} + cos ( W2 ( f _ M2 ) + 0 2 ) dt (50) 
-00 

where A, B and C are defined in Equations (15-17). 

To perform numerical integration, one can replace the improper integral above with a 
definite integral on a sufficiently large interval [a; b], so that 

/+oo pb 
gi(t)g 2 (t) dt - gi(t)g 2 {t)dt <r) (51) 
-oo J a 

for given error bound rj. Such interval can be constructed as [B — A; B + A] to fulfil 



\Jb+a 



< -rj 
2 1 



1/ 



(52) 

B—A 



gi(t)g2(t)dt 



2 1 



Therefore, A can be calculated as 

where erf -1 is an inverse of the error function 
2_ f x _a 

The value of A in (53) is well-defined unless inequality (49) is fulfilled, in which case 
the atoms are orthogonal to the given precision and there is no need to define integration 
interval. 

To simplify formula (53), one can use inequality 



2 C x 2 

erf(x) = — e" f dt . (54) 
lit Jo 



erf^l -x) < y/- \ogx, (55) 
which is fulfilled for all* € (0; 1]. Therefore, 



is guaranteed to fulfil A' > A. 
Optimal phase of a Gabor function 

In the following we find an explicit formula for the phase (/> max , that maximizes the prod- 
uct of signal x with Gabor function of given time position u, frequency co, and scale s. Let 
us recall the formula (3) of a real Gabor function: 

g y (t) = iC(K)e" 7r (^) 2 cos (w(t - u) + <j>) . 

y denotes the set of parameters y = {u, s,ct>,(f>} and K(y ) is such that \ \g Y \\ = 1. Writing 
K(y ) explicitly gives 

e" 71 ^) 2 cos (co(t-u) + cP) 

8(r)W = -7=^2 (57) 

We'^y—' cos (co(t-u) + 0)| | 
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Phase shift 0 can be also expressed as a superposition of two orthogonal oscillations 
cos (co(t — u) + <j>) and sin (co(t — u) + </>). We define 

C = e" 7r ( £ ?) 2 cos (w(t-u)) 

(58) 

S = e -7r (^) sin(ftj(i- w)) 
and, using the trigonometric identity 

cos(a + 0) = cos a cos 0 — sin a sine/), (59) 

we write the Gabor function (57) as 
C cos 0 — S sin 0 

gy (*) = ^ . „ . ... (60) 

|| C cos0 — 6 sin <p || 

Using ||*|| 2 = (x,x), and the orthogonality of C and S defined in (58) ({C,S) = 0), we 
write the product of Gabor function defined as (60) with the signal x as 

(x, C) cos 0 — (x, S) sin 0 (x, C) — (x, S) tan 0 

(*'£r> = — ^ t — c . ,,, — = , (61) 

|| C cos 0-S sin 0|| 7(C, C> + (S,S) tan 2 0 

We are looking for the maximum absolute value of this product. For the sake of 
simplicity we will maximize (x,g Y ) 2 instead of \(x,g y )\. Denoting v = tan 0 

2 ((x,C)-(x,S)v) 2 

(x,g r ) = Ir rx - /c c . 2 (62) 
(C, C) + (S,S)v z 

To find v which maximizes (62), we look for zeros of the derivative ^{x,g Y ) 2 and find 
two roots: 

(x,C) {x,S)(C,Q 

v\ = , vi = 

(x, S) (x, C) (S, S) 

Substituting these values for tan 0 in (61) we get 

(X,gy)\ V=n = 0 

, 0 , (x,S) 2 <C,C) 

/ lv=V2 



(«,S) 2 (C,C) 2 
(x,C) 2 (S,S) 

Since for vi the square of the product is minimum (zero), the other extremum is a 
maximum in vi- Therefore, the phase 0 that maximizes (x,g Y ) 2 is given by 
(x,S)/(S,S) 

0max = arctan (63) 
(x, C)/(C, C) 

and the maximum value of the product is 

. . (x,C) 2 /(C,C) + (x,S) 2 /(S,S) 

(X,gy)max = , (64) 

J(x,C) 2 /(C,C) + (x, S) 2 /(S, S) 
Applying Fast Fourier Transform 

Estimation of the product of a Gabor function with signal x (61) requires computing of the 
inner product of signal x with functions C and S defined in Equation (58). Let us rewrite 
the formulae for < x, C > and < x, S >: 

*e _7r( ^) cos(w(£-w)) (65a) 

-00 

xe - n (^r> sm{w{t- u)) (65b) 

-00 
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Multiplying (65a) by i = and substracting it from (65b), one can obtain the following 
equation: 

/oo 2 
e -Mt-u) (66) 
-oo 

The right side of Equation (66) is the Fourier Transform of signal x windowed by Gauss 
functions e _7r *-~-' . This formula allows for fast computation of inner products < x, C > 
and < x, S >, since in and optimal dictionary the atoms with the given scale s are arranged 
in frequency domain with uniform step Aco (see Equation (29)). 

Additional structures in the dictionary 

Apart from the Gabor functions, Gabor dictionary implemented in mp5 contains also the 
following functions: 

• "Pure" harmonic waves 

S(t) = K s cosM + (j>) (67) 

where K s is normalization factor such that (S(t), S(t)) = 1 on the analysed signal 
length. The phase <p of the signal S(t) is estimated according to Equation (63). 
Harmonic functions are distributed in frequency domain with step Aa> determined 
by Equation (29); 

• Kronecker delta functions 

1, for t = u 



A{t - u) = 



(68) 
0, for t ^ u 



In this work, the delta functions are distributed across the whole time domain, that is 
for each point of the time series. 
• "pure" Gaussians 

, (t-«) 2 

G(t) = K g e * 2 (69) 

Where Kg is normalization factor such that (G(t), G(t)) = 1 on the analysed signal 
length. These functions are distributed in the scale domain with step a (see Equation 
(27)) and with step Au (30) in the time domain. 

Additional files 



Additional file 1 : A video tutorial for downloading and configuration of the Svarog package, used for 
computations and visualization of results. 

Additional file 2: Shows the steps from loading the signal into Svarog to displaying interactive map of 
time-frequency energy distribution. 

Additional file 3: Contains help of the MP module from Svarog. 
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